These instructions are essential, so please read them all carefully.
Submit your homework on your GitHub page as the RMarkdown (.Rmd) and HTML files.
Please answer the question prompt and show your code (inline). That is, all your code should be visible in the knitted chunks.
To complete this homework, you may write in the HW3.Rmd file.
(It is recommended to complete this homework in R Studio, where clicking the
Knit button would knit your homework.)
Note: Most of your R code for this homework will not be in this homework.
You mainly write .R files inside the R or tests folders.
Therefore, you only need to show a little code inside this R Markdown file.
You only need to write things inside this R Markdown file in the questions that explicitly ask you to do so.
Please use https://docs.google.com/document/d/1-ZTDPp39zbhsKbQ2FfBoCtiXKQj0jQSg75MAatYfiGI/edit?usp=sharing as a resource for this homework.
Please be respectful to your peers for this homework.
Each student’s compute_maximal_partial_clique() function is anonymized and
being used by every other student for this homework.
Some implementations of this function work smoother than others.
Nonetheless, 1) please be respectful to all the implementations throughout this homework, as you might not realize you’re talking to the author of an implementation, and 2) please do not feel embarrassed if you (knowing which implementation is yours, since you wrote it) does not work as well as you initially thought. Please do not try to “figure out” which student wrote which implementation.
I am overwhelmingly confident that you all wrote an implementation that suits your current comfort in R (which, of course, is different from student to student).
Intent: The intent of this question is to review other implementations. This is to see how your sense of coding compares to your peers.
Note 1: My instructions in HW3 were unclear about how to define the density of a clique size of one (i.e., just one node) – I (Kevin) apologize. Let’s define the edge density of 1 node to be 1 (so returning a set of 1 node is always a valid clique, albeit not the largest one).
In your GitHub Issues for HW3, I have assigned each of you two compute_maximal_partial_clique implementations to review. For example, if you were assigned #1 and #3, you will be reviewing the implementations of
compute_maximal_partial_clique1() and compute_maximal_partial_clique3().
Question 1A: We will have gone over in Lecture 8 how to install all the compute_maximal_partial_clique implementations into your UWBiost561 package. In short, you will find a zip file on Canvas called hw4_implementations.zip. Download it, and unzip the file. You should see a bunch of functions under the files-to-put-into-R-folder folder, including a compute_maximal_partial_clique_master.R and many compute_maximal_partial_clique implementations (each as its own .R file).
Copy and paste all these files into your R folder in your UWBiost561 package.
Then, in your DESCRIPTION file, please add igraph under Imports:. That is, your DESCRIPTION file should contain under Imports: (at minimum):
Imports:
igraph
Once you’ve done both these things, run devtools::check() to add the documentation for all the compute_maximal_partial_clique functions. (If you had no warnings or errors at the completion of your HW3, you should still have no warnings or errors now.) Then, run devtools::install() to install your new UWBiost561 package.
(There is nothing to report for this question.
You do not make these changes to your R Markdown file.)
Question 1B: Looking at two implementations you were given, summarize what you think the implementations are doing in one to four sentences each. (This is to practice reading other people’s code.)
Answer:
compute_maximal_partial_clique2: identifies the largest partial clique in a graph, given an adjacency matrix and a required edge density. It first validates the input parameters, ensuring the adjacency matrix is square, symmetric, and binary, and checks the range of the graph’s size and the edge density. The function then generates all cliques in the graph, iterates through combinations of these cliques to find the largest partial clique meeting the specified edge density, and finally returns a list of node indices constituting this clique along with the calculated edge density.
compute_maximal_partial_clique14: finds the maximal partial clique in a graph based on an adjacency matrix and a specified minimum edge density. It begins by validating the input, ensuring the adjacency matrix is square, symmetric, binary, and of appropriate size, and that the edge density parameter is within the required range.
Question 1C: In terms of coding clarity (i.e., not whether the code gives a good answer), are there recommendations you would give the original author of the code to improve their code’s clarity? This can be in documentation, variable naming, whether some code could have been factorized into their dedicated functions, if some portion of code was hard to understand, etc. Give none to two suggestions for each implementation. (You can give no suggestions if you thought the code was spectacular – even if it’s not returning a great answer, you can easily understand what the code does despite being written by someone else.)
Answer:
For compute_maximal_partial_clique2, there could be an improvement in variable naming: The variable names like clique_idx and m can be more descriptive. For instance, clique_idx could be renamed to current_clique_indices and m to num_nodes_in_clique to improve readability.
For compute_maximal_partial_clique14’: separate functions like enqueue(queue, item) and dequeue(queue) would make the queue operations clearer and more abstracted from the main logic of the function. But both implementations are fairly clear.
Question 1D: In HW3, you wrote down at least 5 unit tests for your implementation of compute_maximal_partial_clique(). In this R Markdown file (i.e., not in your tests folder), copy-paste your 5 unit tests and see if the two compute_maximal_partial_clique implementations you were assigned pass your unit tests. If you were assigned implementations #2 and #7, you would test the functions compute_maximal_partial_clique2() and compute_maximal_partial_clique7().
Note 1: If you have more than 5 unit tests, you only need to try 5 for this question. Since you’re testing two implementations, you’ll have 10 unit tests total.
Note 2: When copy-pasting your unit tests, you can directly copy-paste the entire testthat() function into this R Markdown file the contents of the testthat(). (See the example below in the R Markdown file itself. Here, you can either use library(testthat) in the R Markdown file or prefix your test_that() and other functions with testthat::.)
test_that("A test that passes", {
x = 10
expect_true(x == 10)
})
#> Test passed 😸
test_that("A test that fails", {
x = 10
expect_true(x != 10)
})
#> ── Failure: A test that fails ──────────────────────────────────────────────────
#> x != 10 is not TRUE
#>
#> `actual`: FALSE
#> `expected`: TRUE
#> Error:
#> ! Test failed
Note 3: You can write two large R code chunks, one for each implementation’s 5 unit tests. Since the implementation you’ve been assigned might not necessarily pass your unit tests, please set the error = TRUE flag in your R code chunk options. (See the example below in the R Markdown file itself.)
rowSums(1:5)
#> Error in rowSums(1:5): 'x' must be an array of at least two dimensions
"asdf" + "1234"
#> Error in "asdf" + "1234": non-numeric argument to binary operator
Note 4: If you would like to test for things that weren’t in your HW3, feel free to do so. (Your tests for this question don’t literally need to be the same ones you created for HW3.)
# Test that the output list has the required names and order
test_that("Output list has required names and correct order", {
adj_mat <- matrix(sample(0:1, 25, replace = TRUE), 5, 5)
diag(adj_mat) <- 1
result <- compute_maximal_partial_clique2(adj_mat, 0.6)
expect_named(result, c("clique_idx", "edge_density"))
})
#> ── Error: Output list has required names and correct order ─────────────────────
#> Error in `compute_maximal_partial_clique2(adj_mat, 0.6)`: could not find function "compute_maximal_partial_clique2"
#> Error:
#> ! Test failed
# Test that clique_idx contains no duplicates and is within bounds
test_that("clique_idx contains no duplicates and values are within bounds", {
n <- 10
adj_mat <- matrix(sample(0:1, n*n, replace = TRUE), n, n)
diag(adj_mat) <- 1
result <- compute_maximal_partial_clique2(adj_mat, 0.5)
expect_true(length(unique(result$clique_idx)) == length(result$clique_idx))
expect_true(all(result$clique_idx >= 1 & result$clique_idx <= n))
})
#> ── Error: clique_idx contains no duplicates and values are within bounds ───────
#> Error in `compute_maximal_partial_clique2(adj_mat, 0.5)`: could not find function "compute_maximal_partial_clique2"
#> Error:
#> ! Test failed
# Test expected output structure
test_that("Output structure is correct", {
adj_mat <- diag(10)
result <- compute_maximal_partial_clique2(adj_mat, 0.5)
expect_is(result, "list")
expect_named(result, c("clique_idx", "edge_density"))
})
#> ── Error: Output structure is correct ──────────────────────────────────────────
#> Error in `compute_maximal_partial_clique2(adj_mat, 0.5)`: could not find function "compute_maximal_partial_clique2"
#> Error:
#> ! Test failed
# Test edge density calculation
test_that("Edge density calculation is accurate", {
n <- 5
adj_mat <- matrix(0, n, n)
diag(adj_mat) <- 1
adj_mat[1:3, 1:3] <- 1 # Create a fully connected subgraph of size 3
result <- compute_maximal_partial_clique2(adj_mat, 0.8)
expected_density <- 1 # Fully connected
expect_equal(result$edge_density, expected_density)
})
#> ── Error: Edge density calculation is accurate ─────────────────────────────────
#> Error in `compute_maximal_partial_clique2(adj_mat, 0.8)`: could not find function "compute_maximal_partial_clique2"
#> Error:
#> ! Test failed
# Test recovery of inserted clique
test_that("Function can recover an inserted clique", {
n <- 10
adj_mat <- matrix(0, n, n)
diag(adj_mat) <- 1
adj_mat[1:5, 1:5] <- 1 # Create a fully connected subgraph of size 5
result <- compute_maximal_partial_clique2(adj_mat, 0.8)
expect_true(all(result$clique_idx %in% 1:5))
expect_equal(length(result$clique_idx), 5)
})
#> ── Error: Function can recover an inserted clique ──────────────────────────────
#> Error in `compute_maximal_partial_clique2(adj_mat, 0.8)`: could not find function "compute_maximal_partial_clique2"
#> Error:
#> ! Test failed
# Test that the output list has the required names and order
test_that("Output list has required names and correct order", {
adj_mat <- matrix(sample(0:1, 25, replace = TRUE), 5, 5)
diag(adj_mat) <- 1
result <- compute_maximal_partial_clique(adj_mat, 0.6)
expect_named(result, c("clique_idx", "edge_density"))
})
#> ── Error: Output list has required names and correct order ─────────────────────
#> Error in `compute_maximal_partial_clique(adj_mat, 0.6)`: could not find function "compute_maximal_partial_clique"
#> Error:
#> ! Test failed
# Test that clique_idx contains no duplicates and is within bounds
test_that("clique_idx contains no duplicates and values are within bounds", {
n <- 10
adj_mat <- matrix(sample(0:1, n*n, replace = TRUE), n, n)
diag(adj_mat) <- 1
result <- compute_maximal_partial_clique14(adj_mat, 0.5)
expect_true(length(unique(result$clique_idx)) == length(result$clique_idx))
expect_true(all(result$clique_idx >= 1 & result$clique_idx <= n))
})
#> ── Error: clique_idx contains no duplicates and values are within bounds ───────
#> Error in `compute_maximal_partial_clique14(adj_mat, 0.5)`: could not find function "compute_maximal_partial_clique14"
#> Error:
#> ! Test failed
# Test expected output structure
test_that("Output structure is correct", {
adj_mat <- diag(10)
result <- compute_maximal_partial_clique14(adj_mat, 0.5)
expect_is(result, "list")
expect_named(result, c("clique_idx", "edge_density"))
})
#> ── Error: Output structure is correct ──────────────────────────────────────────
#> Error in `compute_maximal_partial_clique14(adj_mat, 0.5)`: could not find function "compute_maximal_partial_clique14"
#> Error:
#> ! Test failed
# Test edge density calculation
test_that("Edge density calculation is accurate", {
n <- 5
adj_mat <- matrix(0, n, n)
diag(adj_mat) <- 1
adj_mat[1:3, 1:3] <- 1 # Create a fully connected subgraph of size 3
result <- compute_maximal_partial_clique14(adj_mat, 0.8)
expected_density <- 1 # Fully connected
expect_equal(result$edge_density, expected_density)
})
#> ── Error: Edge density calculation is accurate ─────────────────────────────────
#> Error in `compute_maximal_partial_clique14(adj_mat, 0.8)`: could not find function "compute_maximal_partial_clique14"
#> Error:
#> ! Test failed
# Test recovery of inserted clique
test_that("Function can recover an inserted clique", {
n <- 10
adj_mat <- matrix(0, n, n)
diag(adj_mat) <- 1
adj_mat[1:5, 1:5] <- 1 # Create a fully connected subgraph of size 5
result <- compute_maximal_partial_clique14(adj_mat, 0.8)
expect_true(all(result$clique_idx %in% 1:5))
expect_equal(length(result$clique_idx), 5)
})
#> ── Error: Function can recover an inserted clique ──────────────────────────────
#> Error in `compute_maximal_partial_clique14(adj_mat, 0.8)`: could not find function "compute_maximal_partial_clique14"
#> Error:
#> ! Test failed
Intent: The intent of this question is to construct a simulation study that you’ll run on Bayes.
In this question, you’ll be designing a simulation. You will be doing this on Bayes for a couple of reasons: 1) To give you experience running code on Bayes, and 2) because your simulations might take a couple of hours to run.
Note: inside the file compute_maximal_partial_clique_master.R (which should now be in your R folder), I’ve provided you two functions: compute_maximal_partial_clique_master() and compute_correct_density(). You will find both functions useful for Q2 and Q3.
Question 2A:
Please run the following code. If you cannot run it, something has gone wrong with your UWBiost561 package. (For instance, it might be because your generate_partial_clique() function does not work or is not correctly located in your R package.)
library(UWBiost561)
set.seed(10)
data <- UWBiost561::generate_partial_clique(n = 10,
clique_fraction = 0.5,
clique_edge_density = 0.95)
set.seed(10)
result1 <- UWBiost561::compute_maximal_partial_clique_master(
adj_mat = data$adj_mat,
alpha = 0.95,
number = 11,
time_limit = 30
)
result1
set.seed(10)
result2 <- UWBiost561::compute_maximal_partial_clique11(
adj_mat = data$adj_mat,
alpha = 0.95
)
result2
As you can see, the compute_maximal_partial_clique_master() function: 1) takes a number argument (which allows you to control which implementation of compute_maximal_partial_clique you’re using) and 2) sets a timer (here, of 30 seconds)
so that the function terminates in at most 30 seconds.
Using the provided code as a framework, use the compute_maximal_partial_clique_master() with a setting of time_limit such that the function terminates prematurely, i.e., time_limit is set to be a small number. (That is, if, hypothetically, the function required 40 seconds to complete, but you set time_limit to only be 30 seconds.) By changing the ’ number ’ argument, you can use any implementation of compute_maximal_partial_clique you prefer for this question.
Note: For this question, you want to generate an adjacency matrix where n is large (i.e., a value close to 50), but time_limit is very small (i.e., just a few seconds). You deliberately want to cause a timed_out status.
Question 2B: Describe what you would like your simulation to study in a few sentences. Specifically, you are making a simulation plan (see Lecture 8 for details).
Your plan should answer the following questions:
generate_partial_clique() function, and then you would describe what kind of graph your generate_partial_clique() makes.)n, clique_fraction, clique_edge_density, and/or alpha. Your simulation study can focus on changing one or two values.)compute_maximal_partial_clique, but please write this down for thoroughness.)The only hard requirements I am imposing for this simulation study are:
compute_maximal_partial_clique in your simulation study, where each method is used on every adjacency matrix you generate.n (the number of nodes), the largest n you consider in your simulation study is n=50.compute_maximal_partial_clique_master() when using each of the 25 implementations (instead of calling the compute_maximal_partial_clique implementations directly). Additionally, please do not set time_limit to be larger than 30 (i.e., regardless of large your simulation study gets, please do not allow more than 1 minute for any implementation). There are 25 implementations, and I wouldn’t want just one trial to take more than 15 minutes!Some ideas of what you can test for is: how often does each method get the maximal partial clique (among all 25 methods) when the number of nodes n changes, or when alpha changes? You can also try incorporating how fast an implementation is (in terms of time) into your simulation study.
Feel free to post on Canvas Discussions your thoughts on the simulation study if you are unsure about your simulation study.
Note 1: If you choose to vary n in your simulation study, a value of n of 30 or more will make your simulation take a long time to finish.
Note 2: You will use the compute_maximal_partial_clique_master(), which can time a function out after a time_limit number of seconds (by default, 30 seconds). Therefore, the number of problem instances you will solve is, roughly speaking, (the number of levels) x (the number of methods) x (the number of trials per level), and the maximum time your simulation would need would be (the number of levels) x (the number of methods) x (the number of trials per level) x 30 seconds.
Note 3: Please design a straightforward or complicated simulation study appropriate to your comfort in coding. If you are overwhelmed by this homework, you can code a simulation study that only takes a few minutes to complete. (You can use the demo in class as a rough skeleton for your simulation study. Of course, you can design your own way to perform a simulation study as well.)
Note 4: For the simulation study, you will (of course) need to generate random adjacency matrices. You can use the generate_partial_clique() you already created for HW3, but you can also modify generate_partial_clique() to better suit your simulation study’s goals.
Note 5: If you feel overwhelmed, you can more-or-less plan a simulation study that is very similar to the demo done in class (Lecture 8). Specifically, inside the hw4_implementations.zip, there are files hw4-demo_bayes_execute.R, hw4-demo_bayes_execute.slurm, and `hw4-demo_bayes_plot.R that effectively tell you how to do a specific simulation study. Feel free to use these scripts as closely as you want for your simulation study. (However, you would still need to make a simulation function and follow the guidelines still. There is still some non-trivial work you would still need to do.)
(This is a writing question, not a coding question.)
Answer:
To evaluate the performance and efficiency of 25 different implementations of the compute_maximal_partial_clique function, I will conduct a comprehensive simulation study. This study aims to investigate how often each method successfully identifies the maximal partial clique and how their performance varies with changes in graph size and edge density.
For generating random graphs, I will use the generate_partial_clique function. This function creates graphs with a specified number of nodes (n), a fraction of nodes forming a clique (clique_fraction), and a specific edge density within the clique (clique_edge_density). In this study, I will focus on graphs with three different sizes: 30, 40, and 50 nodes. The clique fraction will be set to 0.5, ensuring that half of the nodes form a partial clique. The edge densities within the cliques will be varied at 0.8, 0.9, and 0.95 to observe the effects of different densities on the performance of the implementations.
Each level of analysis will involve varying the graph size and the clique edge density, resulting in nine distinct combinations. For each combination, I will conduct two trials to ensure the robustness of the results. In total, this will amount to 18 trials. Each trial will test all 25 implementations of the compute_maximal_partial_clique function. The function compute_maximal_partial_clique_master() will be used to manage these implementations, with a time limit set to a small value, such as 15 seconds, to ensure that the function terminates within a reasonable period, even if it fails to complete.
The primary goal of this simulation study is to identify which implementations consistently find the maximal partial clique and to measure the time taken by each method. Additionally, I will examine how often each implementation terminates due to the time limit and whether any patterns or trends emerge across different graph sizes and edge densities. With the time limit for each implementation set to 15 seconds and considering the constraints of the study, the total estimated time for the simulation will be around 1.875 hours, ensuring that the study remains feasible within the provided 5-hour limit. This simulation study will provide valuable insights into the efficiency and reliability of different algorithms for finding maximal partial cliques in large graphs.
Question 2C: In your R folder, design a function that executes your intended simulation plan. (I am purposely being loose and very open-ended about designing this simulation regarding the inputs or outputs. In contrast to HW3, where everything was spelled out explicitly, I am now giving you the task of meaningfully designing the inputs/outputs.)
Note 1: You want to use compute_maximal_partial_clique_master() in your simulation. This would make your life easier when switching between different implementations. The main difficulties in this question are:
compute_maximal_partial_clique_master()adj_mat adjacency matrices themselves)? This will highly depend on what your simulation plan is trying to study!compute_maximal_partial_clique_master(): A) times out, B) errors, or C) outputs an alleged clique_idx that is invalid because it doesn’t form a partial clique with edge density alpha or larger.Note 2: Be aware – you should be a healthy skeptic when using other people’s code in a simulation. Just because a method claims it found a partial clique, you should verify that it is indeed a partial clique with edge density larger than alpha. You might want to use the compute_correct_density() function provided in compute_maximal_partial_clique_master.R.
Note 3: I am asking you to make a function to perform your simulation (instead of just an R script) since I will be asking you to test your simulation function in Question 2D below.
Note 4: You should allow your simulation function to take as input the different levels of your simulation study and the number of trials (and other parameters of your choosing). That way, you can more easily “test” a “simple” simulation study in Question 2D below.
Note 5: You (ideally) want to set up your simulation such that for each trial in a level, you generate one random adjacency matrix that is used in all 25 different implementations of compute_maximal_partial_clique. This gives a “fair” comparison across all 25 methods. You also might want to set the random seed (via set.seed()) before generating that random adjacency matrix for this trial so you can easily reproduce the results from a particular trial if needed.
Note 6: This question is the hardest part of HW4. Since this question is purposely more open-ended, it is harder to use ChatGPT to help you with this function. However, most of your future coding experiences will be open-ended, so practicing “coding with ChatGPT” in more unstructured settings is good. I encourage you to work with your classmates on this question (but, as per our syllabus rules, do not directly copy your classmate’s code).
(There is nothing to report for this question. Your code will be in the R folder, not
in this R Markdown file.)
Question 2D: Create a few (more than one) unit tests for your simulation function. This unit test (like your unit tests in HW3) will be in your tests/testthat folder. I am also purposely vague about how many tests or what kind of unit tests to write. This is for you to decide! After all, you’re about to unleash your code to perform a (potentially long) simulation test, so you hope you’ve tested your simulation code well enough for this to be a good use of time.
Note: Your unit tests should be fast (i.e., take no more than a minute to run). This means your unit tests should not be performing your complete simulation study. After all, it’ll be a useless unit test if it takes an hour to figure out if your unit test passed. However, you want your unit tests (which take less than a minute) to give you confidence that your code will work when running the complete simulation study (which might take more than an hour).
(There is nothing to report for this question. Your code will be in the tests/testthat folder, not
in this R Markdown file.)
Question 2E: Similar to what you did in HW3, please include a screenshot of your output after running devtools::check(). You should still have a UWBiost561 package
that passes all the checks and your unit tests.
(As a guideline, one screenshot should show the first 20-or-so lines of your
devtools::check() results, and a second screenshot should show the last
20-or-so lines of your results.) You can use the knitr::include_graphics() function to include
figures inside this R Markdown file.
The intent of this question is to provide “evidence” that your devtools::check()
went smoothly. You do not need to worry about what your screenshots show specifically.
knitr::include_graphics("/Users/tatithegreat/Documents/UW/BIOST561/UWBiost561/Images/Screenshot8.png")
knitr::include_graphics("/Users/tatithegreat/Documents/UW/BIOST561/UWBiost561/Images/Screenshot9.png")
knitr::include_graphics("/Users/tatithegreat/Documents/UW/BIOST561/UWBiost561/Images/Screenshot10.png")
knitr::include_graphics("/Users/tatithegreat/Documents/UW/BIOST561/UWBiost561/Images/Screenshot11.png")
knitr::include_graphics("/Users/tatithegreat/Documents/UW/BIOST561/UWBiost561/Images/Screenshot12.png")
knitr::include_graphics("/Users/tatithegreat/Documents/UW/BIOST561/UWBiost561/Images/Screenshot13.png")
knitr::include_graphics("/Users/tatithegreat/Documents/UW/BIOST561/UWBiost561/Images/Screenshot14.png")
# Q3: Performing the simulation study
Intent: The intent of this question is to perform the simulation study and to practice visualizing simulation results.
Now that you’ve written a function to perform your simulation study, you want to write more code to perform the actual simulation study. This will technically involve at least 3 new files in your vignettes folder: one .R file to perform the simulation study, one accompanying .slurm script to submit those above .R file to Bayes, and one .R file to load the simulation results and visualize the results. The following questions will guide you through these three files.
Question 3A: In your vignettes folder, make a file called HW4_simulation_execute.R, which will be a R script that: 1) loads your UWBiost561 package, and 2) executes your simulation function inside your UWBiost561 package.
This simulation will be the “full” simulation study you outlined in Question 2B above.
(This is the specific .R file that might take a couple of hours to finish, depending on how complex your simulation study is.) The end of your HW4_simulation_execute.R should save your simulation results as HW4_simulation.RData, an .RData file. (For example, the argument to the save() function could be file = ~/HW4_simulation.RData. This would save the results to a HW4_simulation.RData file under your home directory on Bayes.)
Then, make an accompanying HW4_simulation_execute.slurm that submits your HW4_simulation_execute.R as a job to Bayes. Please limit yourself to 10gb of memory (i.e., a line in HW4_simulation_execute.slurm should read #SBATCH --mem-per-cpu=10gb). (Your simulation study should not require more than 10 Gigabytes of memory. If it does, you are likely not being practical in your simulation study.)
(There is nothing to report for this question. Your code will be in the vignettes folder, not
in this R Markdown file.)
See Lectures 7 and 8 for more details on how to do this.
Question 3B: Now, install both igraph and your UWBiost561 package in R.
The igraph package can be installed as usual (i.e., via install.packages(igraph) in an interactive R session on Bayes). Lecture 8 and https://docs.google.com/document/d/1-ZTDPp39zbhsKbQ2FfBoCtiXKQj0jQSg75MAatYfiGI/edit?usp=sharing will give you a few options on how to install your UWBiost561 package on Bayes.
(There is nothing to report for this question.)
Question 3C: Now, run your simulation study. In the terminal, this means you will 1) navigate to your UWBiost561/vignettes folder on Bayes (this might be something like cd ~/UWBiost561/vignettes) and 2) run the command: sbatch HW4_simulation_execute.slurm.
Note 1: This question might cause you pain since you might not know if your script is working as intended. (Additionally, suppose you did not test your simulation function thoroughly. In that case, your HW4_simulation_execute.slurm script might crash, and you’ll be forced to debug your UWBiost561 package and re-install it, etc. It’s a pain. Trust me, you want to test your simulation function thoroughly before you do this Question.)
Note 2: Do not parallelize your simulation across multiple cores or write a batch SLURM script to submit multiple jobs for this question. (Many of you will likely be working on the homework at similar times, and I cannot guarantee that the server can allow an entire course of students to submit parallelized jobs simultaneously.) If this note does not make any sense to you, don’t worry about it.
(There is nothing to report for this question. You are simply running your HW4_simulation_execute.slurm file.)
Question 3D: Finally, create a file called HW4_simulation_plot.R under the vignettes folder. You will design this script to load your saved results in HW4_simulation.RData, visualize the results, and save the plot into your vignettes folder. Name this one plot as HW4_simulation.png. (You can save your plot in a different file format, but .png is the easiest one to work with.)
How complicated should this script be? It depends on the results you saved in HW4_simulation.RData. For example, if you saved the clique size of each method in each trial for each level, then you might need to compute the average clique size across all the trials for each level in this script.
What should you plot? It depends on what your simulation plan was intended to study (which you wrote in Question 2B). I’m leaving this open-ended so you can practice determining how best to demonstrate the results you’re trying to study.
Note 1: You can run your HW4_simulation_plot.R script interactively on Bayes. You do not need to write a .slurm script for this if you prefer not to (since your entire HW4_simulation_plot.R script should take only a few minutes to run. After all, you’re simply loading in and plotting the results. There shouldn’t be any very fancy computation being performed in this script, and indeed, you shouldn’t be computing any maximal partial cliques in this script).
Note 2: Feel free to consult https://r-graph-gallery.com/ to get ideas on how to meaningfully visualize your results.
(There is nothing to report for this question. Your code will be in the vignettes folder, not
in this R Markdown file.)
Question 3E: Now, on Bayes, there should be at least four new files in the vignettes folder due to Question 3A-3D: HW4_simulation_execute.R, HW4_simulation_execute.slurm, HW4_simulation_plot.R, and HW4_simulation.png. (There might be many other files, but I am okay with it if there are more files than needed.) Commit and push all these files via Git (using the command line on Bayes) onto GitHub.com (via git push), and then pull all these files via Git onto your local laptop. (You can pull either through the command line via git pull or the RStudio GUI.)
Finally, include your plot in this R Markdown file. (I’m having you put your plot in your vignettes folder and push/pull your UWBiost561 package because I’m assuming you knit this HW4.Rmd file locally on your laptop, but you need to somehow get your plot from Bayes onto your local laptop.)
knitr::include_graphics("/Users/tatithegreat/Documents/UW/BIOST561/UWBiost561/vignettes/HW4_simulation_execute_plot.png")
# Q4: Describing your final project
Intent: The intent of this question is to make sure you have a plan for the final project.
I will release the final project specifications (again, which involve making a PkgDown website of any R package of your choosing of any scope) on Canvas. This will be released (at the latest) by May 19th.
Please write (in one to five sentences) what you are thinking of doing for your final project. If your answer hasn’t changed since HW3, you can copy-paste your answer from HW3 into this question for HW4.
Answer:
I’ll be making a Pkgdown webpage for my IdeasCustom package for analyzing gene expression data on individual-level using IDEAS mehtod and decomposed components of Wasserstein-2 distance, that includes functions to arrange gene expression data by donors, initialize distance array lists, compute divergence metrics using the Wasserstein distance, and more.
This “question” is an additional way for students to communicate with instructors. You could include positive feedback about topics you enjoyed learning in this module, critiques about the course difficulty/pacing, or some questions/confusions you had about course material. Your feedback can help shape the course for the rest of this quarter and future years. Please be mindful and polite when providing feedback. You may leave this question blank.